Skip to content

fix: keep StreamingDataSource alive on non-Eof stream errors#168

Draft
kinyoklion wants to merge 3 commits intomainfrom
rl/sdk-2346/streaming-source-recovers-from-stream-errors
Draft

fix: keep StreamingDataSource alive on non-Eof stream errors#168
kinyoklion wants to merge 3 commits intomainfrom
rl/sdk-2346/streaming-source-recovers-from-stream-errors

Conversation

@kinyoklion
Copy link
Copy Markdown
Member

@kinyoklion kinyoklion commented May 8, 2026

Summary

The StreamingDataSource fetch loop matched any error other than Eof (and recoverable HTTP responses) with a catch-all _ => break, dropping the spawned task and leaving the data source silently dysfunctional. Because nothing was polling the eventsource-client's stream anymore, its reconnect logic never ran. The None branch also broke without notifying the caller.

This PR:

  • Replaces the catch-all break with continue, so the eventsource-client gets to reconnect on the next poll. A warning log keeps transient errors visible.
  • Calls init_complete(false) in the None branch before breaking, so callers waiting on initialization get a signal when the stream is permanently closed.

Context

Surfaced from launchdarkly/rust-server-sdk#116. That report has two contributing bugs; this PR addresses the rust-server-sdk side. The matching rust-eventsource-client side is in launchdarkly/rust-eventsource-client#134 (SDK-2345). Together they restore the reconnection contract: the eventsource-client owns reconnection, and the SDK keeps polling and trusts it.

Tracked in SDK-2346.

Test plan

  • New test streaming_source_recovers_from_non_eof_stream_error (launchdarkly-server-sdk/src/data_source.rs) — sends invalid UTF-8 that triggers Error::InvalidLine from the eventsource-client parser and asserts the mock receives at least two requests, proving the SDK kept polling and the underlying client reconnected.
  • cargo test --package launchdarkly-server-sdk --lib — 300 tests pass; no regressions.
  • cargo fmt --check clean.

Closes #116.

The fetch loop matched any error other than `Eof` (and recoverable
HTTP responses) with a catch-all `_ => break`, dropping the spawned
task and leaving the data source silently dysfunctional. Because no
one was polling the eventsource-client's stream anymore, its reconnect
logic never ran. The `None` branch also broke without notifying the
caller, so callers waiting on initialization had no signal that the
stream had closed permanently.

Replace the catch-all `break` with `continue` so the eventsource-client
gets to reconnect on the next poll, and call `init_complete(false)` in
the `None` branch so the caller can observe permanent stream closure.

Adds `streaming_source_recovers_from_non_eof_stream_error` test --
sends invalid UTF-8 to trigger `Error::InvalidLine` from the parser and
asserts the mock receives at least two requests.

Pairs with launchdarkly/rust-eventsource-client#134, which fixes the
matching state-transition gap on the eventsource-client side. Together
they restore the reconnection contract: the eventsource-client owns
reconnection, and the SDK keeps polling and trusts it.

Closes #116.
@kinyoklion kinyoklion changed the title fix: keep StreamingDataSource alive on non-Eof stream errors [SDK-2346] fix: keep StreamingDataSource alive on non-Eof stream errors May 8, 2026
kinyoklion added 2 commits May 8, 2026 14:45
The 401 path (and any other status that `is_http_error_recoverable`
rejects) should stop the streaming fetch loop and signal init failure
to the caller. The catch-all-error change in this PR sits directly
next to that branch, so add a behavioral test exercising the 401 path
end-to-end through the fetch loop -- existing coverage was only of
the predicate function.
Three follow-ups from review of #168:

* Routes `Error::Eof` to `debug!` so healthy stream rotations don't
  spam `warn!` lines on every reconnect.
* Rewords the catch-all comment and log: some non-Eof errors converge
  via the eventsource-client's `StreamClosed` transition (observed as
  `None` on the next poll), not in-stream reconnect, so "will retry"
  was misleading.
* Strengthens `streaming_source_recovers_from_non_eof_stream_error`
  with an `Arc<Mutex<Option<bool>>>` capture asserting `init_complete`
  is *not* called during the retry window. Permanent-failure init is
  already covered by `streaming_source_stops_on_unrecoverable_http_status`.
// on the next iteration). Breaking here
// would drop this task and leave the data
// source silently dysfunctional.
warn!("error on event stream, will keep polling: {e:?}");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might change the language here. Customers will think polling means non-streaming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data source can shut down with no way to detect it

2 participants